correct option
- Law (0.92)
- Information Technology (0.92)
- Government (0.67)
- Education > Educational Setting (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Appendix A
Q: For what purpose was the dataset created? Q: Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Q: Who funded the creation of the dataset? Q: What do the instances that comprise the dataset represent (e.g., documents, photos, people, Q: How many instances are there in total (of each type, if appropriate)? As shown in Table 1, the dataset statistics are as follows: Grounding Task: 111,770 samples for training, 21,616 samples for testing. For grounding, we use only one annotation per image.
- Asia > Singapore (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Monaco (0.04)
- (2 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Education (1.00)
- Health & Medicine > Diagnostic Medicine (0.93)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)
Metric-Fair Prompting: Treating Similar Samples Similarly
Wang, Jing, Shen, Jie, Niu, Xing, Zhang, Tong, Weiss, Jeremy
We introduce \emph{Metric-Fair Prompting}, a fairness-aware prompting framework that guides large language models (LLMs) to make decisions under metric-fairness constraints. In the application of multiple-choice medical question answering, each {(question, option)} pair is treated as a binary instance with label $+1$ (correct) or $-1$ (incorrect). To promote {individual fairness}~--~treating similar instances similarly~--~we compute question similarity using NLP embeddings and solve items in \emph{joint pairs of similar questions} rather than in isolation. The prompt enforces a global decision protocol: extract decisive clinical features, map each \((\text{question}, \text{option})\) to a score $f(x)$ that acts as confidence, and impose a Lipschitz-style constraint so that similar inputs receive similar scores and, hence, consistent outputs. Evaluated on the {MedQA (US)} benchmark, Metric-Fair Prompting is shown to improve performance over standard single-item prompting, demonstrating that fairness-guided, confidence-oriented reasoning can enhance LLM accuracy on high-stakes clinical multiple-choice questions.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > California (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Education (1.00)
Appendix A
Q: For what purpose was the dataset created? Q: Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Q: Who funded the creation of the dataset? Q: What do the instances that comprise the dataset represent (e.g., documents, photos, people, Q: How many instances are there in total (of each type, if appropriate)? As shown in Table 1, the dataset statistics are as follows: Grounding Task: 111,770 samples for training, 21,616 samples for testing. For grounding, we use only one annotation per image.
- Law (0.92)
- Information Technology (0.92)
- Government (0.67)
- Education > Educational Setting (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.72)
- Asia > Singapore (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Monaco (0.04)
- (2 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
- Education (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- (16 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)